With the ever-growing model size and the limited availability of labeled training data, transfer learning has become an increasingly popular approach in many science and engineering domains. For classification problems, this work delves into the mystery of transfer learning through an intriguing phenomenon termed neural collapse (NC), where the last-layer features and classifiers of learned deep networks satisfy: (i) the within-class variability of the features collapses to zero, and (ii) the between-class feature means are maximally and equally separated. Through the lens of NC, our findings for transfer learning are the following: (i) when pre-training models, preventing intra-class variability collapse (to a certain extent) better preserves the intrinsic structures of the input data, so that it leads to better model transferability; (ii) when fine-tuning models on downstream tasks, obtaining features with more NC on downstream data results in better test accuracy on the given task. The above results not only demystify many widely used heuristics in model pre-training (e.g., data augmentation, projection head, self-supervised learning), but also leads to more efficient and principled fine-tuning method on downstream tasks that we demonstrate through extensive experimental results.
translated by 谷歌翻译
Time series anomaly detection strives to uncover potential abnormal behaviors and patterns from temporal data, and has fundamental significance in diverse application scenarios. Constructing an effective detection model usually requires adequate training data stored in a centralized manner, however, this requirement sometimes could not be satisfied in realistic scenarios. As a prevailing approach to address the above problem, federated learning has demonstrated its power to cooperate with the distributed data available while protecting the privacy of data providers. However, it is still unclear that how existing time series anomaly detection algorithms perform with decentralized data storage and privacy protection through federated learning. To study this, we conduct a federated time series anomaly detection benchmark, named FedTADBench, which involves five representative time series anomaly detection algorithms and four popular federated learning methods. We would like to answer the following questions: (1)How is the performance of time series anomaly detection algorithms when meeting federated learning? (2) Which federated learning method is the most appropriate one for time series anomaly detection? (3) How do federated time series anomaly detection approaches perform on different partitions of data in clients? Numbers of results as well as corresponding analysis are provided from extensive experiments with various settings. The source code of our benchmark is publicly available at https://github.com/fanxingliu2020/FedTADBench.
translated by 谷歌翻译
Current computer vision models, unlike the human visual system, cannot yet achieve general-purpose visual understanding. Existing efforts to create a general vision model are limited in the scope of assessed tasks and offer no overarching framework to perform them holistically. We present a new comprehensive benchmark, General-purpose Visual Understanding Evaluation (G-VUE), covering the full spectrum of visual cognitive abilities with four functional domains $\unicode{x2014}$ Perceive, Ground, Reason, and Act. The four domains are embodied in 11 carefully curated tasks, from 3D reconstruction to visual reasoning and manipulation. Along with the benchmark, we provide a general encoder-decoder framework to allow for the evaluation of arbitrary visual representation on all 11 tasks. We evaluate various pre-trained visual representations with our framework and observe that (1) Transformer-based visual backbone generally outperforms CNN-based backbone on G-VUE, (2) visual representations from vision-language pre-training are superior to those with vision-only pre-training across visual tasks. With G-VUE, we provide a holistic evaluation standard to motivate research toward building general-purpose visual systems via obtaining more general-purpose visual representations.
translated by 谷歌翻译
Dense pose estimation is a dense 3D prediction task for instance-level human analysis, aiming to map human pixels from an RGB image to a 3D surface of the human body. Due to a large amount of surface point regression, the training process appears to be easy to collapse compared to other region-based human instance analyzing tasks. By analyzing the loss formulation of the existing dense pose estimation model, we introduce a novel point regression loss function, named Dense Points} loss to stable the training progress, and a new balanced loss weighting strategy to handle the multi-task losses. With the above novelties, we propose a brand new architecture, named UV R-CNN. Without auxiliary supervision and external knowledge from other tasks, UV R-CNN can handle many complicated issues in dense pose model training progress, achieving 65.0% $AP_{gps}$ and 66.1% $AP_{gpsm}$ on the DensePose-COCO validation subset with ResNet-50-FPN feature extractor, competitive among the state-of-the-art dense human pose estimation methods.
translated by 谷歌翻译
当训练过度参数化的深网以进行分类任务时,已经广泛观察到,学到的功能表现出所谓的“神经崩溃”现象。更具体地说,对于倒数第二层的输出特征,对于每个类,课堂内特征会收敛到其平均值,而不同类别的手段表现出一定的紧密框架结构,这也与最后一层的分类器对齐。由于最后一层的特征归一化成为现代表示学习中的一种常见实践,因此,在这项工作中,我们从理论上证明了归一化特征的神经崩溃现象是合理的。基于不受约束的特征模型,我们通过限制球体上的所有特征和分类器来简化多级分类任务中的经验损失函数。在这种情况下,我们分析了riemannian优化问题在球体的产物上的非概念景观,从而显示出良性的全球景观,因为唯一的全球最小化器是神经崩溃的解决方案,而所有其他关键点是严格的鞍座。实用深网的实验结果证实了我们的理论,并证明可以通过特征归一化更快地学习更好的表示。
translated by 谷歌翻译
在此技术报告中,我们提出了我们的解决方案,称为MV-FCOS3D ++,适用于Waymo Open DataSet Challenge的仅摄像头3D检测轨道2022.仅使用birde-eye-view或3D检测多视图摄像头3D检测几何表示可以利用相邻视图之间重叠区域的立体声提示,而无需手工制作的后处理即可直接执行3D检测。但是,它缺乏对2D骨架的直接语义监督,可以通过预处理简单的单眼探测器来补充。我们的解决方案是此范式之后用于4D检测的多视图框架。它是基于简单的单眼检测器FCOS3D ++构建的,仅通过Waymo的对象注释进行了预定,并将多视图功能转换为3D网格空间以检测其上的3D对象。设计了单帧理解和时间立体声匹配的双路径颈部,以结合多帧信息。我们的方法最终通过单个模型实现了49.75%的MAPL,并在WOD挑战中赢得了第二名,而在训练过程中没有任何基于激光雷达的深度监督。该代码将在https://github.com/tai-wang/depth-from-motion上发布。
translated by 谷歌翻译
最近的研究表明,基于神经网络的深度推荐系统容易受到对抗性攻击的影响,攻击者可以将精心制作的虚假用户配置文件(即,伪造用户与之互动的一组项目)注入目标推荐系统,以实现恶意目的,例如促进或降低一组目标项目。由于安全性和隐私问题,在黑框设置下执行对抗性攻击更为实用,在黑框设置下,攻击者无法轻松访问目标系统的体系结构/参数和培训数据。但是,在Black-Box设置下生成高质量的假用户配置文件,对于目标系统的资源有限,这是一项挑战。为了应对这一挑战,在这项工作中,我们通过利用项目的属性信息(即项目知识图)引入了一种新颖的策略,这些信息可以公开访问并提供丰富的辅助知识来增强伪造用户配置文件的产生。更具体地说,我们提出了一项知识增强的黑框攻击框架(KGATTACK),以通过深度强化学习技术有效地学习攻击政策,其中知识图无缝集成到层次结构策略网络中,以生成伪造的用户配置文件,以表演对抗性黑色 - 黑色 - - 黑色 - 黑色 - 盒子攻击。在各种现实世界数据集上进行的全面实验证明了在黑框设置下提出的攻击框架的有效性。
translated by 谷歌翻译
会话推荐系统(CRS)旨在捕获用户的当前意图,并通过实时多转交流交互提供建议。作为人机互动系统,CRS必须改善用户体验。但是,大多数CRS方法忽略了用户体验的重要性。在本文中,我们为CRS提出了两个关键点,以改善用户体验:(1)像人类一样说话,人类可以根据当前的对话环境以不同的风格说话。 (2)识别精细颗粒的意图,即使对于相同的话语,不同的用户也具有多种良好的意图,这与用户的固有偏好有关。根据观察结果,我们提出了一个新颖的CRS模型,即创建的定制对话推荐系统(CCRS),该系统从三个角度从三个角度定制了用户的CRS模型。对于类似人类的对话服务,我们提出了多式对话响应生成器,该响应响应生成器选择了语音发言的上下文感知语言风格。为了提供个性化的建议,我们在用户固有的偏好的指导下从对话上下文中提取用户当前的细粒度意图。最后,为了自定义每个用户的模型参数,我们从元学习的角度训练模型。广泛的实验和一系列分析表明,我们的CCR在推荐和对话服务上的优势。
translated by 谷歌翻译
本文旨在共同解决联邦学习中的两个看似相互矛盾的问题:差异隐私(DP)和拜占庭式的bub症,当分布式数据是非i.i.d时,它们尤其具有挑战性。 (独立和相同分布)。标准的DP机制为传输消息增加了噪音,并与强大的随机梯度聚集纠缠以防御拜占庭式攻击。在本文中,我们通过强大的随机模型聚合使这两个问题解除了这两个问题,从某种意义上说,我们提出的DP机制和对拜占庭式攻击的辩护对学习绩效的影响分开了。在每次迭代中,利用强大的随机模型聚合,每个工人都计算本地模型和全局模型之间的差异,然后将元素符号发送给主节点,这使拜占庭式攻击能够鲁棒性。此外,我们设计了两种DP机制来扰动上传的标志,以保存隐私,并通过利用噪声分布的属性来证明它们是$(\ epsilon,0)$ -DP。借助Moreau信封和近端投影的工具,当成本函数是非convex时,我们确定了所提出算法的收敛性。我们分析了隐私保护和学习绩效之间的权衡,并表明我们提出的DP机制的影响与强大的随机模型聚集是解耦的。数值实验证明了所提出的算法的有效性。
translated by 谷歌翻译
最近,与培训样本相比,具有越来越多的网络参数的过度参数深度网络主导了现代机器学习的性能。但是,当培训数据被损坏时,众所周知,过度参数化的网络往往会过度合适并且不会概括。在这项工作中,我们提出了一种有原则的方法,用于在分类任务中对过度参数的深层网络进行强有力的培训,其中一部分培训标签被损坏。主要想法还很简单:标签噪声与从干净的数据中学到的网络稀疏且不一致,因此我们对噪声进行建模并学会将其与数据分开。具体而言,我们通过另一个稀疏的过度参数术语对标签噪声进行建模,并利用隐式算法正规化来恢复和分离基础损坏。值得注意的是,当在实践中使用如此简单的方法培训时,我们证明了针对各种真实数据集上标签噪声的最新测试精度。此外,我们的实验结果通过理论在简化的线性模型上证实,表明在不连贯的条件下稀疏噪声和低级别数据之间的精确分离。这项工作打开了许多有趣的方向,可以使用稀疏的过度参数化和隐式正则化来改善过度参数化模型。
translated by 谷歌翻译